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Abstract 



A bit-string model for the evolution of a population of haploid 
organisms, subject to competition, reproduction with mutation 
and selection is studied, using mean field theory and Monte Carlo 
simulations. We show that, depending on environmental flexibil- 
ity and genetic variability, the model exhibits a phase transtion 
between extinction and survival. The mean-field theory describes 
the infinite-size limit, while simulations are used to study quasi- 
stationary properties. 
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I. INTRODUCTION 



Many mathematical models have been proposed to describe the evolution of 
populations, focusing on varied aspects, for example, mutation accumulation 
IHJ^, and adaptation In the first case, how deleterious mutations 

are passed to offspring, and the consequences for individual growth, are of 
particular interest. In the second, the principal interest is the influence of 
different environmental conditions on the population. One goal in this area 
is the development of a simple model capable of describing the response of 
a population to environmental mutability. Of interest, for example, is the 
ability of a population to adapt to rapid changes in its environment. Penna's 
bit-string model [|^ seems well suited to this purpose. 

In this paper, we propose a model for evolution of an adapting popula- 
tion, to study the consequences of variation of conditions affecting survival, 
related to environmental flexibility, and the genetic variability of the popula- 
tion. Our main interest is to describe the conditions determining the extinc- 
tion or survival of the population. The population evolves in discrete time 
with non-overlapping generations. It consists of haploid organisms defined by 
their genotype (a bit-string of G positions, or genes). The individuals undergo 
asexual reproduction, subject to mutation, competition and selection. Selec- 
tion is represented though a survival probability that depends on the difference 
between a genome and a certain ideal genome. Environmental changes can be 
represented via alteration of this ideal. In the present study, however, the 
ideal genome is fixed, allowing a systematic study of the effect of various other 
parameters upon survival. 

We develop a mean-field (MFT) description, which describes the evolution 
of an infinite population exactly, since it has no spatial structure. We also 
perform Monte Carlo simulations for the model. The latter are useful for 
studying fluctuations due to finite population size, that are not captured in the 
MFT. We determine the survival/extinction phase boundary, and compare the 
temporal evolution, and the genomic distribution of the population predicted 
by MFT against simulation results. 

The paper is organized as follows. In Sec. II, we define the model and in Sec. 
Ill develop the MFT. Sec. IV describes the Monte Carlo simulation algorithm, 
while Sec. V reports MFT and simulation results. We present our conclusions 
in Sec. VI. 
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II. MODEL 



We study a model for evolution of a population of liaploid individuals de- 
fined by their genomes, and subject to competition, asexual reproduction with 
mutation, and selection. In this model, successive generations do not overlap. 
Each individual is represented by a bit-string of G positions (genes), denoted by 
the vector cr = (cti, (72, Ug), where o", = or 1. The fitness of an individual 
to the environment is measured in relation to a "model individual" (or "ideal 
genome"), represented by the sequence cTj = 0, z = 1, G. Each gene in state 
1 represents a reduction in fitness, and carries the same weight, independent 
of its position i. Thus the Hamming distance from the ideal genome, given by 
H = J2i(^ij characterizes an individual's fitness (This manner of characterizing 
fitness has been used in several studies of age-structured populations P,H,0-) 
The dependence of fitness on H is through the survival probability 



S{H) is the probability for an individual to survive up to the stage in which 
she must compete with the rest of the population; individuals that survive the 
competition stage go on to reproduce offspring, as detailed below. The pa- 
rameter r, which plays a role analogous to temperature in equilibrium statis- 
tical mechanics, represents environmental flexibility, while B, which is related 
to the genetic variability of the population, represents mutational tolerance. 
S{H) = 1 for H = 0, and decays monotonically with H. We note that for 
fixed H and B, the survival probability is an increasing function of r, and that 
for fixed H and r, S is an increasing function of B. The Fermi-like function 
S{H) was used in a similar manner in the model of Thoms a/ P|. These 
authors define a death probability pd = [e^*^^~") + where P is an inverse 
temperature and {b — a) represents the difference between the typical number 
of mutations in the population, and the number of mutations of the individual. 

At reproduction, each organism is replaced by two offspring. The latter are 
copies of their parent, with a certain number m of mutations. Each position 
has a probability of A to mutate (mutations — 1 and 1 —* are considered 
equally likely), with mutations at different positions constituting independent 
events. The number of mutations m therefore follows a binomial distribution. 
The mean number of mutations per reproduction event, AG, is set to unity in 
this study. 

Competition amongst individuals is represented by the familiar Verhulst 
factor. 



S{H) 



(1) 



V 



1 - 



N{t) 



(2) 



max 



3 



where N{t) is the population at time t and Nmax is the maximum capacity 
of the environment. The evolution of the population proceeds by discrete 
time steps: at each step, the Verhulst factor is applied by selecting at random 
(independently of H), NV survivors; the survivors go on to reproduce as 
described above. 



III. MEAN-FIELD THEORY 

We have developed a mean-field description of the model defined above. 

For this model, which has no spatial structure, the deterministic mean-field 
description describes the infinite-size limit {Nmax oo) exactly. Differences 
between theory and simulation are due to fiuctuations that appear in finite 
sized systems, but that are absent in the infinite-size limit. 

In the full stochastic description there are 2^ distinct genomes o", and an 
integer- valued random variable Ncrit) > for each. Our first step in construct- 
ing a simphfied description is to reduce the set of variables to N{H,t): the 
number of individuals with Hamming distance H from the ideal, at time t. 
Since the model docs not distinguish between individuals with the same Ham- 
ming distance, the probability distribution at any time t > will be a function 
of H only, if it is so at t = 0. We shall always suppose this to be the case. 

In the mean-field theory, the discrete-time evolution of the population may 
be written so: 

N{H,t+l) = E[N{H,t+l)\{N{H,t)}], (3) 

where {N{H, t)} represents the entire set of population variables at step t. In 
other words, the population at step t+1 is approximated by its expected value, 
given the distribution at step t. (The latter, in turn, is given by the expected 
distribution, given that for time t — 1, and so on.) The integer- valued random 
variables of the exact description are therefore replaced by a set of real-valued, 
deterministic variables. 

Each step of the evolution consists of two stages: (1) death of individuals due 
to competition for resources ('Verhulst stage'); (2) reproduction/selection. In 
the Verhulst stage, the total population size N = Y^fj N[H) is evaluated; then 
each subpopulation is reduced by the same factor, V — 1 — N/N^ax, yielding 
the values: 

N'{H)^VN{H), {H^O,...,G). (4) 
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Note that the Verhulst stage involves an interaction between individuals 
{N'{H) is a nonhnear function of all of the N{H)), and that each individ- 
ual interacts equally with all others in this process. 

In the reproduction stage each individual is replaced by a pair of offspring 
that have, in general, Hamming distances different from those of the parent. 
We assume independent, equally probable mutations at each site, so that the 
number of mutations m in a given reproduction event is binomially distributed: 

Pim)^(^)x-{1-Xf--. (5) 



(Since G ^ 1 while the mean number of mutations AG is of order unity, we 
may approximate P(m) by a Poisson distribution in simulations; we retain the 
binomial distribution in the MFT analysis.) 

Each reproduction event may be represented schematically as H' — > Hi,H2, 
where H' denotes the Hamming distance of the parent and Hi and H2 those 
of the offspring. Since H' — > Hi and H' — > H2 are independent events (even 
though they happen simultaneously), it suffices to consider one such, i.e., H' 
H; let W{H\H') represent its probability. If the offspring differs from its parent 
at exactly m positions, then, 

max[0, H' -m]<H < mm[H' + m,G]. 



Let m = mo + mi, with tuq the number of mutations 0^1 and mi the number 
of type 1 — > 0. Each event is characterized by H', m, and uiq. (Evidently, 
H — H' + mo — mi = H' + 2mo — m.) The probability of such an event is 
given by the hypergeometric distribution: 




p(mo|m, G, H') = ^ /-^^ ^ . (6) 




Now using mo = {H — H'+m)/2, we have, 

W(H\H') = (G-H')\H'\ y r / ^ 

\ \ ^ \ ' / H-H'+m \ \ f H'-H+m \ \ H+H'+m \ \ f H'+H-m \\ 

(7) 



Next we observe that the expected number of surviving offspring with Ham- 
ming distance H produced by a parent with Hamming distance H' is: 
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W{H\H') = 2S{H)W{H\H'). Thus the expected number of individuals with 
Hamming distance H, at step t+1 is: 

E[N{H,t + l)\{NiH',t)}] = f: W{H\H')N\H') , (8) 

H'=Q 

where N'{H') is the distribution just after the Verhulst step. The evolution of 
the population is found via numerical iteration of Eqs. (^) and (||). 



IV. SIMULATION ALGORITHM 

We study the evolution of the model population in Monte Carlo simulations. 
Initially, A^^o = Nmax/^O individuals of G = 128 bits are generated, each with 
a random gene sequence, a = (o"i, o"2, o"g)) where CTj = or 1 with equal 
likelihood. The procedure is as follows: 

i) The Verhulst factor V = 1 — N{t)/Nmax is evaluated. Then for each 
individual, a random number s is generated; the individual survives (dies) if 
s<V {s>V). 

a) Each individual reproduces: 2 copies are created, with possible muta- 
tions. The number of mutations m is given by a random integer, chosen from 
a Poisson distribution with parameter 1. The mutation loci are selected at 
random. 

in) For each daughter, the Hamming distance H from the ideal is evaluated, 
and a random number r, uniform on [0,1] is generated. If r < S{H), the 
individual survives; otherwise, it dies. 

During the simulations, we record the population, average Hamming dis- 
tance, the average survival probability, 

1 N{t) 

(Sit)) = ]^ E sm, (9) 

and the survival rate, S{t) = N{t)/N{t — 1). (Note that in general {S{t)) < 
1, while S{t) may, in principle, take any nonnegative value, and is unity in 
the stationary state.) Depending on the parameters r, B, and Nmax, the 
population may survive until a certain maximum time {tmax = 30 000 steps 
in the simulations), attaining a quasi- stationary state, or may go extinct. We 
record the Hamming distance distribution in the quasi-stationary state. 
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V. RESULTS AND DISCUSSION 



Depending on the values of B and r that characterize the survival probabil- 
ity function S{H), Eq. (|l|), the population either survives or goes extinct. In 
the mean-field theory this is a sharp transition. In simulations, due to finite 
population size, fluctuations into the absorbing state (population zero) are to 
be expected. Indeed, for any finite system size the population must eventu- 
ally go extinct, if the process is permitted to continue indefinitely. We adopt 
tmax = 30 000 as a convenient maximum time, allowing us to discriminate be- 
tween survival and extinction, and (in the former case), study quasi-stationary 
properties, except very near the transition, where, as noted, the sharp distinc- 
tion is blurred by fluctuations. 

Fig. 1 shows the phase boundary between survival and extinction in the B - 
T plane, comparing the mean-field prediction against simulations using N^ax = 
10^, 10^ and 5 x 10^. As N^ax is increased, the survival/extinction line found in 
simulation approaches the MFT prediction, as expected. For small values of r, 
(a "hard" or inflexible environment), survival of population requires high values 
of B, the mutational tolerance. The mean-field survival/extinction line of the 
diagram is obtained by fixing the parameter r and measuring the stationary 
population density p = N/N^ax as a function of B. Near the transition, p 
depends linearly on B: p (x B — Bc{t) , as is normally the case in mean- field 



descriptions of a continuous phase transition to an absorbing state [|10[- The 
line Bc{t) is readily obtained via linear regression to the p{B) data near the 
transition. Note that B^ = for r > 0.192. For r -C 1, on the other hand. 
Be oc 1/r. (Increasing the mutation probability A, the phase boundary is 
displaced upward and to the right, enlarging the extinction region.) Fig. 2 is 
a three-dimensional plot of the population density as a function of B and r; 
the extinction region is evident, as is the monotonic growth of p with either 
parameter. 

Fig. 3 presents a typical evolution of the population density p(t). For B 
and r in the survival phase, the population exhibits a rapid initial decay and 
then evolves to a quasi-stationary state. Simulation and MFT evolutions are 
in good agreement, despite fluctuations in the former. 

The quasi-stationary distribution of Hamming distances obtained in sim- 
ulation is compared in Fig. 4 with the stationary distribution predicted by 
mean-field theory. In all cases, the distribution peaks near the mean value 
< H >, and has a generally Gaussian appearance. For fixed r, we observe 
that < H > increases monotonically with B, attaining a plateau, if r is suffi- 
ciently large. The plateau value is < H >~ 64, i.e., half the genome size. For 
fixed B, we observe that < H > increases with r, until attaining < H >= 64. 
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The variance of the distribution behaves similarly. Its saturation value is about 
32, giving a standard deviation a ~ 5.7. This is not surprising, given that B 
and r both represent tolerance of differences from the ideal genome. Fig. 5 
shows the stationary values oi < H > and an as functions of B, as predicted 
by MFT; simulations yield very similar behaviour. In simulations, extinction 
occurs at larger B values than are predicted by MFT, due to finite-size effects, 
as noted above; the difference between simulation and theory diminishes with 
increasing system size. 



VI. SUMMARY 



We propose a bit-string model of the evolution a simple haploid population. 
Similarly to previous studies , the model includes the effect of enviromen- 

tal flexibility and tolerance to genetic differences on the survival probability. 
Unlike previous works, we employ a survival probability that is a monotonic 
increasing function of the parameters B and r that represent tolerance of ge- 
netic difference between a given genome and the ideal. The model is studied 
via computer simulations and mean-field theory, which are in good agreement. 

The model, like many others in population dynamics or epidemic analysis, 
exhibits a continuous transition between an active phase (survival) and an 
absorbing one (extinction). We map out the phase boundary in the B - t 
plane, and find clear evidence of mean-field-like critical behavior, as in other 
population models lacking spatial structure . The mean- field description is 
exact in the infinite-size limit, but provides no information regarding fluctua- 
tions. On the other hand, simulations for parameter values in the active phase 
yield information on the quasi-stationary state of a finite system [Nmax < oo). 
It is also of interest to obtain the lifetime of this quasi- stationary state, or, 
equivalently, the mean first-passage time to extinction. Such information can 
in principle be obtained from simulations, or from a probabilistic analysis of 
finite populations, starting from the master equation [^. Given the large 
number of random variables involved (G-l-l, if we assume that the probabil- 
ity depends only on Hamming distance if), the multivariate Fokker-Planck 
equation would seem the most convenient tool; theoretical analysis of finite 
populations is left as subject for future work. The simulation results reported 
here should prove useful in testing such theories. 

Another interesting direction for future study is the response of the pop- 
ulation to changes in the environment. Such changes can be represented by 
variations in the ideal genome (as presented in [^,0) and/or in the parameters 
r, B, A, and Nmax- A related question is that of transitions in the genome 
distribution when two or more ideals (corresponding to distinct, well adapted 
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types in the fitness landscape), exist. Studies of these problems using the 
bit-string model are in progress. 
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Figure Captions 



Fig. 1. Survival/extinction phase boundary in the B-t plane for AG = 1. The 
sohd hne is the MFT prediction; dashed hnes represent simulation results for 
Nmax = 5 X 10^ 10^ and 10^ (bottom to top). 

Fig. 2. Population density p as a function of B and r from MFT. For r > 0.192, 
the population survives for any value of B. 

Fig. 3. Time evolution of the population density p for r = 0.1 and B = 4, in 
MFT (smooth curve) and simulation {Nmax = 10^). 

Fig. 4. Stationary Hamming-distance distribution for various parameters, as 
indicated. 

Fig. 5. Dependence of Hamming distance on S for r = 0.1 in MFT. Central 
line: mean Hamming distance, < H >; upper and lower lines represent one 
standard deviation above or below the mean. 
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FIG. 2 
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